Evaluating the utility of effective breeding size estimates for monitoring sea lamprey spawning abundance

Abstract Sea lamprey (Petromyzon marinus) is an invasive species that is a significant source of mortality for populations of valued fish species across the North American Great Lakes. Large annual control programs are needed to reduce the species' impacts; however, the number of successfully spawning adults cannot currently be accurately assessed. In this study, effective breeding size (N b) and the minimum number of spawning adults (N s) were estimated for larval cohorts from 17 tributaries across all five Great Lakes using single nucleotide polymorphisms (SNP) genotyped via RAD‐capture sequencing. Reconstructed larval pedigrees showed substantial variability in the size and number of full‐ and half‐sibling groups, N b (<1–367), and N s (5–545) among streams. Generalized linear models examining the effects of stream environmental characteristics and aspects of sampling regimes on N b and N s estimates identified sample size, the number of sampling sites, and drainage area as important factors predicting N b and N s. Correlations between N b, N s, and capture–mark–recapture estimates of adult census size (N c) increased when streams with small sample sizes (n < 50) were removed. Results collectively indicate that parameters estimated from genetic data can provide valuable information on spawning adults in a river system, especially if sampling regimes are standardized and physical stream covariates are included.

spawning and larval phases during accessible periods of stream occupancy, which spans several years (Applegate, 1950).
Annual surveys that quantify larval abundance, body size (a surrogate of age), and relative distribution are used to prioritize streams for treatment with the lampricide 3-trifluoromethyl-4-nitrophenol (TFM; Jubar et al., 2021).Separating age classes with length alone is difficult, particularly for larger larvae (Dawson et al., 2009).While additional information about the stream environment, such as growing degree days, can improve cohort determination using length, accurate age estimation remains challenging (Dawson et al., 2020).
The number of spawning adults entering streams is assessed using annual trapping and capture-mark-recapture (CMR) methods in several index streams in each Great Lake (Adams et al., 2021).CMR is an effective technique for estimating the total number of adults that could spawn in the stream (i.e., adult census population size; N c ).However, the sampling framework cannot be implemented in a large number of streams due to both environmental conditions that complicate assessment and the high costs associated with evaluating a larger number of streams (Robinson et al., 2021).Furthermore, the assumptions associated with mark-recapture adult abundance estimates have not been fully tested.
Violation of assumptions of CMR models, including behaviors affecting probability of capture, trap efficiency, and low recapture rates, can complicate adult abundance estimates in systems where CMR is conducted (Bravener & McLaughlin, 2013).Additionally, trapping efforts provide information on lamprey entering streams, rather than the number of spawning individuals in sampled tributaries.Environmental variables like stream drainage area, amount of larval habitat, and the number of years since TFM treatment can affect census size estimates of spawning sea lamprey (Mullett et al., 2003).
Complementary data to spawning-and larval-phase sea lamprey assessment, including genetic estimates of the effective and minimum number of breeding individuals (N b and N s , respectively) as surrogate measures of adult abundance, could improve the quantification of this control performance measure, especially in locations where trapping is difficult or intensive monitoring is not possible.
Genomic data and reconstructed pedigrees can provide population-level inference regarding estimates of the minimum number of spawning adults (N s ) and the effective number of breeding adults (N b ), that can be used to fill information gaps in the absence of information on N c and inform management actions for invasive species (Weise et al., 2020).Effective population size (N e ) is a parameter describing the size of an idealized population that experiences drift or inbreeding at the same rate as the sampled population (Wright, 1931).N e is generally calculated on a generational basis (Waples, 2016a;Waples et al., 2014), but the effective number of breeding adults (N b ) can be estimated for individual cohorts with appropriate sampling.Notably, the accidental inclusion of multiple cohorts can bias N b estimates (Robinson & Moyer, 2013;Wang, 2009;Waples, 2005;Waples & Antao, 2014).N b estimation can be complicated by skewed sex ratios, large census size, and highly dispersed species distribution and sampling regimes (Waples et al., 2018).
Since N b is estimated for a single reproductive event, it is used to assess species for early detection of population decline or extinction risk (Jay et al., 2014;Kamath et al., 2015).For invasive species, N b can be used to track the growth and spread of invasive species, or evaluate management intervention (Weise et al., 2020).Several methods are used to estimate N b .The linkage disequilibrium (Waples & Do, 2010) method uses non-random associations between alleles in a set of loci to estimate N b .LD can be caused by physical linkage through proximity in the genome and by finite breeding population size (Hill, 1981, p. 19).Correlations in allele frequencies between loci that are not physically linked thus provide an estimate of N b .
In contrast, the sibship frequency (SF; Wang, 2009) method uses reconstructed pedigrees to estimate N b based on the frequency of full-and half-sibling relationships present among sampled offspring (Wang, 2009).
Another parameter that can be quantified from genetically determined pedigrees is the asymptotic number of spawning adults.The number of unique parental genotypes present in a reconstructed pedigree provides an estimate of the minimum number of spawning adults that produced the sampled offspring (N s ).Estimates can then be extrapolated to the number of successfully spawning adults, minimizing limitations associated with sample size (provided that the sample is representative of the entire population), by estimating the asymptote of the pedigree accumulation curve of unique parental genotypes (Israel & May, 2010;Rawding et al., 2014).Similar to a species accumulation curve in community ecology (Chao, 1987), unique parental genotypes are accumulated as the number of sampled offspring increases.Thus, methods more commonly applied to estimate total species richness from ecological datasets (e.g., Chao, 1987) can be applied to estimate the total number of spawning adults in a system from reconstructed pedigrees ( Ns ; Sard et al., 2021).
Advances in genotyping technology have allowed large-scale genetic-based population assessments with large sample sizes and genome-wide SNP data, including for sea lamprey (Sard et al., 2020;Weise et al., 2020).Genomic technologies like restriction site associated DNA sequencing (RADseq; Baird et al., 2008) and RAD-capture sequencing (Ali et al., 2016) allow for high-throughput genotyping.
Furthermore, a chromosome-level and a germline genome have been assembled for sea lamprey (Smith et al., 2013(Smith et al., , 2018) and a chromosomal-anchored RAD-capture SNP panel was recently developed (Sard et al., 2020), allowing for sequencing of large numbers of individuals at a targeted set of polymorphic loci.Population genetic data sets could be generated annually using these techniques, and used to quantify spawning abundance and provide additional information for monitoring and control under an adaptive management framework.
In this study, our objective was to evaluate the utility of N b and N s estimates in larval sea lamprey collections in 17 Great Lakes tributaries and quantify the influences of stream environmental, biotic, and sampling variables on genetic estimates of spawning adult abundance.Additionally, CMR estimates of N c in a group of streams with annual adult trapping assessments were also included to allow for comparisons between N c estimates and N b and N s .Because aspects of sampling regimes vary in efficiencies and effective sampling areas covered, results from genetic-based assessments may vary among stream reaches and across streams depending on the sampling regimens used to collect genotyped individuals (Hunter et al., 2020).
Consequently, management decisions may also be affected by sampling protocols.Data developed by this project allow assessment of the utility of these genetic estimates for characterizing stockrecruitment dynamics and evaluating the success of sea lamprey control measures targeting the adult life stage that also consider important stream and sampling covariates.

| Sample collection
Sea lamprey larvae (n = 1877) were collected by backpack electrofishing during larval assessment surveys in 17 streams across the Great Lakes basin (Figure 1) by collaborators from the United States Fish and Wildlife Service, United States Geological Survey, and Fisheries and Oceans Canada.Collections occurred at a single time for each stream in the summer and fall of 2019, with the exception of the Middle River, where collections occurred in the summer of 2017.
Larval collections were made opportunistically in transects of approximately 200 m in areas of the stream where annual assessment was generally conducted.Sampled tributaries ranged from large rivers like the Muskegon River (drainage area 7327 ha) to small streams like Swan Creek (drainage area 5 ha).All systems annually recruit larval sea lamprey and received TFM treatments within 5 years of sample collection.Thirteen of the 17 streams are adult CMR index streams, although only 10 streams had a CMR abundance estimate associated with the spawning year of collected larvae.Larvae suspected to be age-1 or age-0 based on estimated length in the field were prioritized for collection over larger individuals in all 2019 samples.Age-1 individuals were primarily collected and sequenced for genetic analysis (Table 1).Age-0 individuals are only available during August and September, when larvae from early spawning adults had grown to a size that was susceptible to larval assessment surveys.
At each collection site, larvae were identified to species using morphology (Potter & Gill, 2003) and anesthetized with MS-222, preserved in 95% ethanol, and returned to the lab for processing (IACUC Approval number: PROTO201800143).Individual lengths were recorded and a tissue sample from each larva was taken for DNA extraction and sequencing.Body length, time of sample collection, and years since TFM treatment were subsequently used to determine whether the collected individuals were age-1 or age-0 larvae (Table 1).

| Sequencing library preparation
DNA extractions were performed using blood and tissue kits (Qiagen DNeasy, QIAGEN, Carlsbad, CA), following manufacturer protocols.
DNA concentrations were quantified with a spectrophotometer (Nanodrop ND-1000, ThermoFisher Scientific) and verified using assay Kits (Quant-iT™ PicoGreen™ dsDNA, Thermo Fisher Scientific Inc.) with a real-time PCR system (QuantStudio 6 Flex, Thermo Fisher Scientific Inc.).DNA extractions were diluted to <100 ng/μl for RAD library preparation.Samples from each stream were randomly distributed across libraries to minimize the potential for library effects.
Reduced representation libraries were constructed using a modified version of the BestRAD protocol (Ali et al., 2016).Briefly, DNA was digested with the SbfI restriction enzyme, and biotinylated Be-stRAD adapters were ligated to samples to serve as individual barcodes.The barcoded DNA was pooled, concentrated with Ampure beads (Beckman Coulter), and sheared to 325 bp using a focusedultrasonicator (Covaris m220, Covaris).DNA fragments with attached bestRAD tags were selected using a streptavidin bead binding assay, and size selection was used to select target size fragments of 300 bp.A 22:50 ratio of Ampure beads to sample was used to select long fragments and a 13:72 ratio was used to separate target size fragments from short fragments.Sample preparation kits (NEBNext, New England BioLabs Inc.) were used to ligate plate-specific Illumina adaptors and an Illumina universal adapter was used to prepare the library for sequencing.Libraries were pooled in groups of four, and then a panel of ~3400 SNPs was targeted for sequencing using a custom hybridization capture kit (MyBaits, Arbor Biosciences), designed by Sard et al. (2020), with the manufacturer protocol and 11 PCR cycles in the final amplification step.Libraries were sequenced on a total of five sequencing lanes (Illumina HighSeq X) at Novogene with paired-end 150 bp sequencing.Note: Treatment year and month refer to the most recent lampricide treatment that occurred in the stream, the years since treatment variable refers to the number of years between the most recent treatment and spawning year.N c is the census-size estimate based on mark-recapture for the years 2016 and 2018, the number in parentheses is the coefficient of variance as calculated by Adams et al., 2021, the asterix indicates that no coefficient was available (N. Johnson, personal communication).Drainage refers to the drainage area of the stream (in hectares).Sample Sites refer to the number of collection locations for the larval collections, and sampling distance is the estimated linear distance that sampling occurred in the stream (in km).For streams with one sampling site, the standard transect length for larval sampling (0.2 km) is used.

| Bioinformatic analysis
and Seth Smith, University of Montana) and demultiplexed using the Stacks 2.0 (Catchen et al., 2013) function process_radtags.Cloned reads were removed from each individual with the Stacks 2.0 function cloneFilter (Catchen et al., 2013), and reads were trimmed and quality filtered with Trimmomatic (Bolger et al., 2014) with a minimum length of 50, a sliding window of four bases, and a minimum quality score of 15.BWA-mem (Li, 2013;Li & Durbin, 2010) was then used to map all reads to the sea lamprey chromosome-level reference genome (Timoshevskaya et al., 2023).SAMtools (version 1.9; Li et al., 2009) was used to sort mapped reads.The sorted reads were genotyped using the Stacks function gStacks (Catchen et al., 2013), and a sorted VCF file was generated along with population-level sta-

| N b and N s estimates
For all cohorts, two estimates of N b along with N s were generated.Estimates from the linkage disequilibrium method (Waples & Do, 2008) were calculated using NeEstimator (Do et al., 2013).A p crit value of .05 was specified to exclude loci with low minor allele frequency and locus pairs within chromosomes were excluded from the calculation of correlation in allele frequency to avoid the effects of physical linkage (Waples, 2016a).Confidence intervals were estimated in NeEstimator using the provided jackknife method.Colony version 2.0.6.6 (Jones & Wang, 2010) was run for each stream population to reconstruct the pedigrees of each system and generate an estimate of N b using the sibship frequency method.The full-likelihood approach with a medium-length run was used for all streams.Other input parameters changed from default settings were unknown allele frequencies, polygamous mating, and no sibship scaling or prior sibship reported (Wang & Santure, 2009).Additionally, the mean (k) and variance (V k ) of adult reproductive success for contributing adults were calculated for each reconstructed pedigree from each stream.The number of unique parental genotypes was recorded from the reconstructed pedigree (N s ) and the total number of parents in the stream ( Ns ) was estimated using pedigree accumulation analysis as described by Sard et al. (2021).The number of unique parental genotypes in the reconstructed pedigree was quantified as the offspring sample size increased and extrapolated using the Chao estimator (Chao, 1987) with the function specpool from the R package vegan (Oksanen et al., 2019).the generalized linear models are summarized in Table 1.

| Statistical analyses
Generalized linear models with the above environmental variables as independent variables were generated for estimates of N b (both LD and SF methods), Ns , and V k .Predictor variables were evaluated for collinearity using variance inflation factors, calculated with the VIF function from the R package car (Fox & Weisberg, 2019).
Model selection was conducted using Akaike Information Criteria (Akaike, 1974) with a correction applied for a small sample size (Burnham & Anderson, 2002), calculated with the dredge function in the MuMIn R package (Barton, 2022).Models with ΔAICc < 2 were included in the confidence set for each predictor, and coefficients were averaged across these models using the model.This additional correlation for streams with large sample size was generated after sample size was found to be an important predictor of both N b and N s estimates in the linear models described above.

| N b , Ns, and Ns estimates
The number of families and the size of half-sibling networks varied widely among streams, implying large differences in spawning population size and variance in individual reproductive success across streams.For example, Swan Creek and Bad River had a small number of full-sibling families, and comparatively few half-sibling families (Figure 2).Locations like the Ocqueoc River and the Muskegon River were represented by mostly unrelated individuals (Figure 2), while the Middle River pedigree was composed of many small, interconnected half-sibling families.Contrastingly, Misery River was characterized by a comparatively smaller number of highly inter-connected half-sibling families (Figure 2).
In most systems, N b estimates from LD and SF were of similar magnitude (Table 2).In systems where the LD method did not agree with the SF method, or when the confidence intervals did not overlap, LD estimates were generally lower (Table 2).Estimates of  and Swan Creek (k = 10.2).Confidence intervals for the LD method were potentially artificially narrow due to the large number of SNPs used in the analyses (Waples et al., 2022), although the corrected jackknife estimates should reduce that bias.All seven streams with small sample size (n < 50) had LD N b estimates below 100.Of these locations, two had mark-recapture estimates of over 10,000 and N s estimates of less than 100 (Table 2).
Estimates of Ns were generally higher than the N b estimates for each cohort.Some systems had a small sample size (less than 50 individuals), but finite Ns estimates were calculable due to the fact that all stream subpopulations contained some related individuals (Table 2).The Bad River and the East Au Gres River produced small Ns estimates where accumulation curves reached their asymptotes within the sample (Figure 3).In contrast, Ns estimates were larger in the Manistee and Middle Rivers, and an asymptote in the relationship between sampled offspring and the number of unique parental genotypes was not reached (Figure 3).

| Correlations and linear modeling
A correlative relationship between stream estimates of N c and N b or Ns was not evident when data from all sampled locations with markrecapture census size estimates were considered (Figure 4a).However, when streams with small sample size (n < 50) were removed from the analysis, correlation coefficients for the relationship be- The VIF analysis found collinearity among the full set of environmental, biotic, and sampling variables that could be considered in the models.When variables were subset to sample size, sampling distance, years since TFM treatment, and drainage area, VIF results decreased to acceptable levels (GVIF < 2), indicating that collinear variables were successfully removed.The variables that were removed were the month of TFM treatment, which was correlated with years since TFM Treatment, and the number of sampling sites, which was correlated with sampling distance.
Sample size and sampling distance were found to be important predictors in the confidence set of models for N b , Ns , and V k estimates (Table 3).Drainage area was an important predictor in both N b and Ns models, and years since TFM Treatment was an important predictor in the N s and V k model (Table 3).Models incorporating an interaction between sample size and sampling distance were also included in the confidence set for Ns and V k with a negative effect (Table 3).However, in the Ns and V k models, the model-averaged coefficients for several predictors had confidence intervals that overlapped zero, indicating that there is uncertainty in the directional effect of those predictors (Figure 5).In the N b models, confidence intervals for sampling distance overlapped zero, but coefficients for sample size and drainage area suggest positive effects on N b estimates for both predictors (Figure 5).

| DISCUSS ION
Genetic assessment of 17 sea lamprey-producing streams provided information pertaining to sea lamprey reproductive ecology, including mean and variance in reproductive success and full-and half-sibling family size, that could not be obtained from other types of adult assessment.Variability in estimates of N b and N s among streams was notable, indicating that sea lamprey-producing streams have vastly different numbers of successfully spawning adults (Table 2).The number and size of full-and half-sibling families in each stream varied widely, further illustrating variation among Great Lakes tributaries and highlighting the importance of evaluating lamprey spawning populations on a per-stream basis to inform management decisions.Patterns of family structure ranged from cohorts comprised of a small number of full-sibling families, to those consisting of mostly unrelated individuals, to cohorts with large, interconnected half-sibling groups (Figure 2).Variation in the number of spawning adults and their subsequent spawning success may be driven partially by the sea lamprey's lack of homing behavior (Bergstedt & Seelye, 1995;Waldman et al., 2008) and dependence on larval olfactory cues during adult spawning migrations (Wagner et al., 2009).Sample size and the sampling distance were identified as influential predictors of N b and Ns in the confidence set of models, along with physical stream features and past stream management regimes, emphasizing the importance of representative sampling.
While N b and Ns were not significantly correlated with adult census size in the full dataset, relationships were marginally insignificant for well-sampled tributaries (Figure 4b), suggesting that these population genetic parameters could serve as useful surrogates for spawning adult census size.Thus, population genomic datasets like those generated here provide a promising approach for population assessment, particularly in systems where adult trapping is difficult

| Sampling schemes and generalized linear models
Across the streams surveyed, the importance of sampling design, particularly sample size, was highlighted in findings from generalized linear models and correlations between N b and N c CMR estimates.
When streams with small sample size were removed from the dataset, larger correlation coefficients were estimated between N b and Ns estimates and N c (Figure 4).Collectively, results indicate that the opportunistic sampling scheme used to collect sea lamprey larvae in this study, which produced small sample sizes in several streams, may have led to inaccurate representations of larval family composition in some streams.Beyond the sampling variables noted here, there were also environmental predictors with effects on genetic estimates that were identified as important in our linear models when all streams were included in models, as discussed below.
The relationship between N b and N s in sampled Great Lakes streams was shown to be associated with elements of the sampling regime (larval numbers collected, number of sampling sites, and lengths of stream surveyed), stream and environmental factors (drainage area), and management control efforts (recency of TFM treatment).Previous work estimating N c in sea lamprey considered variables similar to the environmental data used in the generalized linear models in this study (Mullett et al., 2003).These variables included drainage area and years since the last TFM treatment, which were included in some of the best models describing associations with N b and N s estimates (Table 3).In the N b models, drainage area and sample size had a positive effect on both estimates of N b , with similar effect sizes, indicating that increasing sample size and larger drainage area lead to larger N b in our sampled streams.Sampling distance had a negative effect on N b estimates, implying that increasing the distance sampled decreased N b estimates in our sampled streams, which conflicts with our expectations.However, both coefficients have confidence intervals that overlap zero, implying some uncertainty in the direction of the effect.Years since TFM treatment had a negative effect on Ns but a positive effect on V k , implying that a recent TFM treatment would lead to less variation in reproductive success and a larger number of unique parent genotypes, the latter of which conflicts with expectations due to the fact that a recent TFM treatment should decrease larval cue and thus the number of adults that enter a stream to spawn (Mullett et al., 2003).The coefficient for the Ns model did have confidence intervals that overlapped zero, however.Note: Sample size indicates the number of sequenced offspring for the cohort.Linkage disequilibrium (LD) and sibship frequency (SF) columns provide effective breeding size (N b ) point estimates and corresponding 95% confidence intervals.k and V k are the mean and variance in reproductive success of contributing parents, as inferred from the reconstructed pedigree.N s is the number of reconstructed parental genotypes for each cohort, and Chao is the Ns point estimate for the extrapolated number of spawning adults ( Ns ) and 95% confidence interval.
those estimates.Low trap efficiency and variation in trap efficiency across years and index streams, as well as variation in catchability for individual lamprey could contribute to uncertainty in N c (Harper et al., 2018).In two of our streams, N c estimates were greater than 10,000, while the number of sampled offspring was less than 50 (Table 1), and the sample was collected from a ratio (Hedrick, 2005).We found substantial variation in V k among streams (Table 2), which may obscure correlations between our N b estimates and N c estimates from CMR efforts.Tests for correlations between N b , Ns , and N c were conducted with and without locations with small samples and the relationships between N c and genetic estimates were strengthened when these systems were removed (Figure 4).When sample size and sampling distance were small, non-representative sampling may have resulted in downwardly biased N b and Ns estimates (Waples & Anderson, 2017).
This same bias applies when a small sample size is used for CMR estimates, compounding differences between genetic and CMR assessment.To reduce such bias, sampling would be best conducted at multiple stream locations that are distributed across the available larval habitat.We further assessed the influences of the sampling scheme in the Middle River, where three sampling sites had sufficient collections for an N b estimate (n > 30 individuals).We generated site-specific N b estimates for all three locations using the LD estimation methods and found that, while point estimates for individual sites ranged from 292 to 394, all three locations produced estimates with confidence intervals that included the estimate from the combined sample.Thus, our results would not have been markedly different for the Middle River, had our estimates been based on a more spatially restricted set of samples.

| Relationship between N b and N c
In previous studies, population estimates of N b and N c have not always exhibited the expected correlations (Bernos & Fraser, 2016;Whiteley et al., 2015) especially when the population size (N c ) was large (Waples, 2016b).However, some studies have found a relationship when environmental factors and population dynamics, and their influences on the N : N c ratio, could be taken into account (Ruzzante et al., 2016).
In particular, the importance of sufficient and representative sampling has been highlighted (Whiteley et al., 2012), which is consistent with results from our study (Table 3; Figure 4).Additional environmental variables not included in our models, including the amount and distribution of spawning habitat relative to sampling effort, density of spawning adults, and stream flow during spawning could also affect N b and N s (Whiteley et al., 2015).A further potential complication is

| Implications for management
Our study illustrates the potential of N b and Ns estimates as sources of information about sea lamprey spawning ecology in Great Lakes tributaries.Population genomic approaches provide tools for the annual assessment of sea lamprey spawning populations, which would be particularly beneficial in lamprey-producing streams where adult trapping is difficult or impossible.Furthermore, if index streams are to be assessed over a large number of years, families and cohorts can be tracked through time using pedigree reconstruction, providing insights on larval growth, dispersal, and survival in streams (Lewandoski et al., 2021).N b and Ns could also be used to evaluate the efficacy of supplemental control techniques including sterile male release and applications of chemical repellant/attractants to increase trapping efficiency or as barriers to upstream migration (Miehls et al., 2021;Siefkes et al., 2021).
These new control techniques are being evaluated for use in streams where TFM treatment and barriers are difficult to use, and as supplements for these techniques where the negative ecological effects of physical barriers and lampricides are high (Siefkes et al., 2021).Additionally, N b provides information on inbreeding, drift, and loss of diversity in the population, all of which can be used to further evaluate current and future control techniques.Ns estimates also can provide an annual metric of the number of successfully spawning lamprey in a stream, which could be used as an assessment metric complementary to N c .More in-depth assessment of lamprey spawning provides insights into successfully reproducing adult lamprey that can be incorporated into adaptive management frameworks to make more precise intervention decisions to reduce lamprey spawning populations.
Our study identified associations between widely used genetic pedigree-based parameters that are surrogates of mark-recapture spawning adult census size estimates and management and sampling Note: Model in bold indicates the averaged model, and the other models included are models selected by an AICc cutoff of 2 to be included in the average.The "+" indicates a parameter included in the model, and "a" indicates an interaction between parameters.
Abbreviations: AICc, Akaike Information Criterion; Chao, Estimate of minimum number of spawning adults extrapolated using the Chao Method; df, degrees of freedom; LD -N b , Estimate of effective breeding size using the linkage disequilibrium method; SF -N b , Estimate of effective breeding size using the sibship frequency method; V k , Variance in reproductive success.

ACK N OWLED G M ENTS
We thank Dr. Jean Adams for her insights into the project design and data analysis, and Dr. Aaron Jubar for his insights into the project design and help coordinating the field work for the project.
We thank field work technicians with USFWS, USGS, and DFO for their work collecting samples for the project, Gabrielle Sanfilippo and Bailey Lorencen for their assistance with laboratory work, and ICER at Michigan State University for computational resources required to analyze the data.We thank the members of the Scribner and Robinson labs for their feedback on the project and early drafts of the manuscript.We also thank the GLFC for their funding used for the project (2019_ROB_540840).Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the US Government.

FU N D I N G I N FO R M ATI O N
We thank the Great Lakes Fishery Commission (GLFC) for their funding used for the project (2019_ROB_540840).

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequencing read data will be uploaded to NBCI SRA upon ac-

A
bioinformatic pipeline based on Sard et al. (2020) and Weise et al. (2020) was used to process sequencing data.First, reads were oriented with a custom perl script bRAD_flip (originally developed by Paul Hohenlohe, University of Idaho, and modified by Brian HandF I G U R E 1Map showing all sampled streams with their location in the Great Lakes system.Each dot represents a lake terminus of a stream system, labels indicate the name of the stream system in subsequent tables and figures.Environmental, biotic, and sampling data for each stream. Aspects of the sampling regime, biotic factors, and the physical environment may all influence estimates of N b and Ns in the sampled systems.For instance, if larval sample sizes are small, or if sampling is not representative of the genotype diversity in the stream, N b and Ns estimates may be biased.Linear models were used to assess the influence of several factors on N b , Ns , and V k for the sampled systems.Publicly available reports on current sea lamprey control and assessment were used to collect information on TFM treatment years(Barber & Steeves, 2019;Mullett & Sullivan, 2017;Steeves & Barber, 2020).Personal communications with collaborators and unpublished data from co-authors were used to obtain data on the drainage area for each stream (J.Adams, U.S. Geological Survey, personal communications, September 2020).Input variables used for avg function, and are evaluated based on the results of the dredge function.Relationships between CMR estimates of adult census size (N c ; Barber & Steeves, 2019; Mullett & Sullivan, 2017; Steeves & Barber, 2020) and estimates of N b and N s across streams were evaluated via Pearson product-moment correlation tests.Correlations were assessed with all possible data points (N = 10 streams), and again after removing streams with small sample sizes (n < 50; N = 7 streams).

N
b varied widely across the sampled cohorts.The largest estimates occurred in the Middle River (N b = 228-353) and the Muskegon River (N b = 249-367), while six streams produced at least one N b estimate under 10 ( tween N c and all three genetic estimates increased (R = .65-.74), indicating that small sample sizes may have introduced a bias in genetic estimates (Figure4b).N b and Ns estimates were positively correlated (SF-based N b and Ns : R = .954;LD-based N b and Ns : R = .951).

E 2
Diagrams of reconstructed larval sea lamprey pedigrees for all stream systems indicating the contrast in inferred spawning adult numbers and mating complexity.Larvae are arranged along the center of each stream diagram and are connected to their reconstructed parents indicated by dots with gray lines.The offspring are sorted by parent 1 sibling groups, then parent 2 sibling groups.or impossible.However, the accuracy of N b and Ns estimates depends critically on experimental design.
Given variation in sample sizes across the streams in our study, and the consistent inclusion of sampling attributes in linear models relating N b and Ns to stream-specific covariates, it seems likely that stream cohorts characterized based on smaller sample sizes did not adequately characterize the spawning populations that produced the sampled cohorts.Sources of uncertainty with N c estimates could have further complicated correlations involving TA B L E 2 N b and N s estimates and population-based information.

F
I G U R E 3 Pedigree accumulation curves(Sard et al., 2021) describing increases in numbers of unique parent genotypes inferred (N s ) as a function of increases in the number of sampled offspring accumulated in each stream.Estimates of Ns are indicated by the dark red line on each plot.single site in the stream.In these systems, non-random sampling may have produced a downward bias in N b and N s estimates.Additionally, if the system had high variance in reproductive success, the outsized effect of a few parents would decrease the N b : N c

F
Evaluation of relationships between effective breeding size (N b ) and minimum number of spawning adults (N s ) estimates and mark-recapture census size estimates (N c ) estimates for sequenced stream populations.Correlation coefficients (R) and p-values are listed in each subplot, relationship is shown with a blue line and a gray outline for the uncertainty.Due to low correlation coefficients, relationships are not visualized for the top row of subplots.geneticcompensation, when variation in reproductive success decreases in small populations, inflating N b estimates compared to N c(Ardren & Kapuscinski, 2003;Whiteley et al., 2015).The increase in correlation between N b and N c estimates after removing small sample size streams in this study, and the positive relationships between factors associated with sampling regimes and N b estimates in our models, underscores the need for large and representative sampling when estimating N b from population genomic data.
regimes and stream physical features.Future genetic work with stratified or strategic larval sampling could provide additional insight into the relationship between N b , Ns , and N c , and clarity on the environmental factors that influence sea lamprey spawning abundance in Great Lakes tributaries.Additionally, by pairing annual evaluation of N b and Ns with multi-year pedigree reconstruction (Bergstedt & Seelye, 1995; Waldman et al., 2008), future studies can both TA B L E 3 Results from environmental, biotic, and sampling linear models.+ Sample size + Sampling distance + Years since TFM treatment + Sample size: since TFM treatment + Sample size + Sampling distance + Sample size: Sampling distance df AICc ΔAICc V k ~ Sample size + Sampling distance + Years since TFM treatment 5 178.148 0.000 V k ~ Sample size + Sampling distance + Years since TFM treatment + Sample size: Sampling distance quantitatively assess control effectiveness and improve our understanding of the larval phase of the sea lamprey life cycle.AUTH O R CO NTR I B UTI O N S Ellen M. Weise: Conceptualization (equal); formal analysis (lead); investigation (equal); methodology (lead); validation (equal); visualization (lead); writing -original draft (lead); writing -review and editing (lead).Kim T. Scribner: Conceptualization (equal); funding acquisition (equal); methodology (equal); supervision (equal); writing -review and editing (equal).Olivia Boeberitz: Formal analysis (equal); methodology (equal); writing -review and editing (equal).Gale Bravener: Funding acquisition (equal); project administration (equal); writing -review and editing (supporting).Nicholas S. Johnson: Conceptualization (equal); funding acquisition (equal); project administration (equal); writing -review and editing (equal).John D. Robinson: Conceptualization (equal); funding acquisition (equal); supervision (equal); writing -review and editing (lead).

Table 2
). Variance in individual reproductive success (V k ) was less than 50 across most systems (mean V k = 22.1, range = 0.1-177.8;Table2),withtheexception of Pigeon River (V k = 177.8)andSwanCreek (V k = 56.6).Mean individual reproductive success was less than 10 across the majority of streams (mean k = 4.4, range = 1.2-15.2;Table2),with higher k in Pigeon River (k = 15.2) Visualization of modelaveraged coefficients describing environmental and response variables for generalized linear models generated for four different genetic estimates.Coefficients with no error bar overlap with zero are shown in blue, coefficients with overlap are shown in red.Zero is plotted with a black dotted line on each subplot.
ceptance of manuscript.Length data and environmental variable data are available on GitHub (https://github.com/weiseell/NbdLamprey2).F I G U R E 5